doi: 10.17586/2226-1494-2019-19-2-299-305


CRITERIA FOR TEXT CONFORMITY TO SCIENTIFIC STYLE 

E. I. Blees, M. M. Zaslavskiy


Read the full article  ';
Article in Russian

For citation:
Blees E.I., Zaslavskiy M.M. Criteria for text conformity to scientific style. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2019, vol. 19, no. 2,  pp. 299–305 (in Russian). doi: 10.17586/2226-1494-2019-19-2-299-305


Abstract

Criteria of text conformity to scientific style were studied. We present the research of repetition rate of keywords and phrases in a text document, percentage ratio of stop words to the total number of words in the text, deviation of the words frequency graph in the text from the ideal Zipf’s chart. The study was carried out involving executable script that checks the text according to several criteria. As a result of an experimental study on a sample of 2500 articles published in HAC/RSCI sources, the distributions of criteria values were obtained and were checked for normality by several criteria, as well as for correlation between them. Based on these data analysis, threshold criteria values were obtained and mathematically substantiated, and then were used on a test sample consisting of the undergraduate works of students in St. Petersburg Electrotechnical University “LETI”, a pseudoscientific article “Rooter: A Methodology for the Typical Unification of Access Points and Redundancy”, technical articles from the Habr Internet IT community, "Capital" by Karl Marx and a number of other texts not related to the scientific style. A necessary but not sufficient condition for the compliance of the article to the scientific style was formulated.


Keywords: scientific style, text analysis, Zipf’s law, scientific articles review automation

References
  1. Demidova A.K. Tutorial in Russian Language: Scientific Style, Design of Scientific Work. Moscow, Russkii Yazyk Publ., 1991, 201 p. (in Russian)
  2. Kirillova O.V. Guidelines for Writing and Design of Scientific Articles in Journals Indexed in International Scientometric Databases. Moscow, ANRI Publ., 2017, 144 p. (in Russian)
  3. Davis H. Search Engine Optimization. O'Reilly Media, 2006, 48 p.
  4. Newman M.E.J. Power laws, Pareto distributions and Zipf's law. Contemporary Physics, 2005, vol. 46, no. 5, pp. 323–351. doi: 10.1080/00107510500052444
  5. Lelu A. Jean-Baptiste Estoup and the origins of Zipf's law: a stenographer with a scientific mind (1868-1950). Boletín de Estadística e Investigación Operativa, 2014, vol. 30, no. 1, pp. 66–77.
  6. Blees E.I., Androsov V.Yu. Automate the process of checking text for compliance with the scientific style. Proc.Modern Technologies in the Theory and Practice of Programming, 2018, pp. 118–121. (in Russian)
  7. Dong X.L. et al. Knowledge-based trust: Estimating the trustworthiness of web sources. Proceedings of the VLDB Endowment, 2015, vol. 8, no. 9, pp. 938–949. doi: 10.14778/2777598.2777603
  8. Script receiving articles selection. Available at: https://github.com/EduardBlees/Master-s-thesis/blob/master/ script/leninka_scrapper.py (accessed: 20.12.2018).
  9. Boeing G., Waddell P. New insights into rental housing markets across the United States: Web scraping and analyzing craigslist rental listings. Journal of Planning Education and Research, 2017, vol. 37, no. 4, pp. 457–476. doi: 10.1177/0739456x16664789
  10. Shapiro S.S., Wilk M.B. An analysis of variance test for normality (complete samples). Biometrika, 1965, vol. 52, no. 3/4, pp. 591–611. doi: 10.2307/2333709
  11. Kolmogorov A. Sulla determinazione empirica di una lgge di distribuzione. Inst. Ital. Attuari. Giorn., 1933, vol. 4, pp. 83–91.
  12. Anderson T.W., Darling D.A. Asymptotic theory of certain "goodness of fit" criteria based on stochastic processes. The Annals of Mathematical Statistics, 1952, vol. 23, no. 2, pp. 193–212. doi: 10.1214/aoms/1177729437
  13. Gmurman B.E. Theory of Probability and Mathematical Statistics. Moscow, Vysshaya Shkola, 2003, 478 p. (in Russian)
  14. Cumming G. Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science, 2008, vol. 3, no. 4, pp. 286–300. doi: 10.1111/j.1745-6924.2008.00079.x
  15. Script for Calculation of Mathematical Distribution Criteria. Available at: https://github.com/EduardBlees/Master-s-thesis/blob/master/script/results/testDistribution.py (accessed: 20.12.2018).
  16. SciPy Module for Python. Available at: https://scipy.org (accessed: 20.12.2018).
  17. Wheeler D.J. et al. Understanding Statistical Process Control. SPC Press, 1992, 406 p.
  18. Easton V.J., McColl J.H. Statistics glossary. Available at: https://stats.gla.ac.uk/steps/glossary/index.html (accessed: 20.12.2018).
  19. Zhukov M.S. Rooter: algorithm for typical unification of access points and redundancy. 2008. Available at: https://e-lub.net/annuals/ratu.htm (accessed: 20.12.2018).
  20. Stribling J., Aguayo D., Krohn M. Rooter: A methodology for the typical unification of access points and redundancy. Journal of Irreproducible Results, 2005, vol. 49, no. 3, p. 5.
  21. My disappointment in software. Available at: habr.com/post/423889/ (accessed: 20.12.2018).
  22. Our personal data is worth nothing. Available at:https://habr.com/post/423947/ (accessed: 20.12.2018).
  23. Story how I steal credit card numbers and passwords from visitors of your sites. Available at: https://habr.com/post/346442/ (accessed: 20.12.2018).
  24. Three-dimensional engine on Excel formulas for dummies. Available at: https://habr.com/post/353422/ (accessed: 20.12.2018).


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2025 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика